Predicting Sporadic Grid Data Transfers

نویسندگان

  • Sudharshan S. Vazhkudai
  • Jennifer M. Schopf
چکیده

The increasingly common practice of (1) replicating datasets and (2) using resources as distributed data stores in Grid environments has lead to the problem of determining which replica can be accessed most efficiently. Due to diverse performance characteristics and load variations of several components in the end-to-end path linking these various locations, selecting a replica location from among many requires accurate prediction information of end-to-end data transfer times between the sources and sinks. In this paper, we present a prediction system that is based on combining end-to-end application throughput observations and network load variations, drawing from their merits of capturing whole system performance and variations in load patterns respectively. We develop a set of regression models to derive predictions that characterize the effect of network load variations on file transfer times. We apply these techniques to the GridFTP data movement tool, part of the Globus ToolkitTM, and observe performance gains of up to 10% in prediction accuracy when compared to approaches based on past system behavior in isolation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Predicting Scientific Grid Data Transfer Characteristics

Big data scientists routinely transfer massive amounts of data. By understanding and modelling different aspects of these data transfers, we can make using big data more efficient and user-friendly. In this paper, we first develop a set of data storage location prediction heuristics. These heuristics help big data scientists manage and discover locations to transfer their data from and to. We s...

متن کامل

Evaluating and Enhancing the Use of the GridFTP Protocol for Efficient Data Transfer on the Grid

Grid applications often require large data transfers along heterogeneous networks having different latencies and bandwidths, therefore efficient support for data transfer is a key issue in Grid computing. The paper presents a performance evaluation of the GridFTP protocol along some typical network scenarios, giving indications and rules of thumb useful to select the “best” GridFTP parameters. ...

متن کامل

GridTorrent: Optimizing data transfers in the Grid with collaborative sharing

As Grid systems expand and become more and more popular, there is a growing need for efficient, scalable and robust data transfer mechanisms that can deal effectively with large file transfers and flash crowd situations. In this paper, we address the problem of data transfer optimization by presenting GridTorrent a modified BitTorrent protocol, tightly coupled with modern Grid middleware compon...

متن کامل

The CMS PhEDEx System: a Novel Approach to Robust Grid Data Distribution

The CMS experiment has taken a novel approach to Grid data distribution. Instead of having a central processing component making global decisions on replica allocation, CMS has a data management layer composed of a series of collaborating agents; the agents are persistent, stateless processes which manage specific parts of replication operations at each site in the distribution network. The age...

متن کامل

Network-Aware HEFT Scheduling for Grid

We present a network-aware HEFT. The original HEFT does not take care of parallel network flows while designing its schedule for a computational environment where computing nodes are physically at distant locations. In the proposed mechanism, such data transfers are stretched to their realistic completion time. A HEFT schedule with stretched data transfers exhibits the realistic makespan of the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002